Research Question: Do countries with higher percentage of females with their needs for family planning met with modern methods have lower neonatal mortality rates?
Introduction
The World Health Organization has made one of its sustainable development goals increasing the number of women who use modern family planning methods. Family planning methods are contraceptive methods that allow women to be sexually active without the worry of becoming pregnant. There are many reasons to increase the amount of women who use family planning including population control, increased maternal health, decreased number of unwanted children, and many more reasons. One reason that increasing family planning may be a positive thing is that it may allow the women to be better prepared for having a child and increase the amount of care the neonate can have access to.
This leads me to the question of whether countries with higher percentage of females with their needs for family planning met with modern methods have lower neonatal mortality rates. I will also be looking at whether family planning has an association with 5 year mortality as well as adolescent birth rates.
Libraries and Packages
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.2 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.3 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Attaching package: 'data.table'
The following objects are masked from 'package:lubridate':
hour, isoweek, mday, minute, month, quarter, second, wday, week,
yday, year
The following objects are masked from 'package:dplyr':
between, first, last
The following object is masked from 'package:purrr':
transpose
library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
library(knitr)library(rnaturalearth)
The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
which was just loaded, will retire in October 2023.
Please refer to R-spatial evolution reports for details, especially
https://r-spatial.org/r/2023/05/15/evolution4.html.
It may be desirable to make the sf package available;
package maintainers should consider adding sf to Suggests:.
The sp package is now running under evolution status 2
(status 2 uses the sf package in place of rgdal)
Support for Spatial objects (`sp`) will be deprecated in {rnaturalearth} and will be removed in a future release of the package. Please use `sf` objects with {rnaturalearth}. For example: `ne_download(returnclass = 'sf')`
library(rnaturalearthdata)
Attaching package: 'rnaturalearthdata'
The following object is masked from 'package:rnaturalearth':
countries110
library(sf)
Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
Introduction
Adding the Data and Merging and Manipulating it
#getwd()#setwd("/Users/christineraj/Documents/PM566/")#uploading the seperate data sheets and getting rid of all the labels from the first 5 rowswhodata1labels <- data.table::fread("whs2023_annex1.csv")labels1 <- whodata1labels[1:5, ]whodata1 <- whodata1labels[-(1:5), ]whodata2labels <- data.table::fread("whs2023_annex2.csv")labels2 <- whodata2labels[1:5, ]whodata2 <- whodata2labels[-(1:5), ]# merging who datamerge(# Datax = whodata1, y = whodata2, # List of variables to matchby.x ="V1",by.y ="V1", # Which obs to keep?all.x =TRUE, ) %>%nrow()
[1] 210
whodata <-merge(# Datax = whodata1, y = whodata2, # List of variables to matchby.x ="V1",by.y ="V1", # Which obs to keep?all.x =TRUE, all.y =FALSE )#head(whodata[, list(V1, V2.x, V3.y)], n = 8)whodata <- whodata[-(1:4), ]#merging labels for reference#merge(# Data# x = labels1, # y = labels2, # List of variables to match# by.x = "V1",# by.y = "V1", # Which obs to keep?# all.x = TRUE, # allow.cartesian = TRUE# ) %>% nrow()#labels <- merge(# Data# x = labels1, # y = labels2, # List of variables to match# by.x = "V1",# by.y = "V1", # Which obs to keep?# all.x = TRUE, # allow.cartesian = TRUE# )#head(labels[, list(V1, V2.x, V3.y)], n = 5)#renaming key variables for conveniencewhodata <- whodata %>%rename(countries = V1)whodata <- whodata %>%rename(year5mortality = V14.x)whodata <- whodata %>%rename(neonatalmortality = V15.x)whodata <- whodata %>%rename(familyplanning = V6.y)whodata <- whodata %>%rename(adolescentbirths15_19 = V8.y)whodata <- whodata %>%rename(adolescentbirths10_14 = V9.y)
Methods
In order to answer the question posed above, data from the World Health Organization (WHO) was obtained. The WHO obtains yearly statistics by country to see how countries are working towards the sustainable development goals that the WHO has set. The data is collected and organized by the WHO and they release the comparable estimates of each country for each sustainable development goal. For this project, I looked at the data by country and obtained each countries proportion of women who have their need for family planning met. This is the independent variable for this project. As the primary outcome I obtained the number of neonatal mortality by country per 1000 neonates. For secondary outcomes I looked at the number of 5 year mortality per 1000 people by country, and the adolescent birth rate of both adolescents aged 10-14 and 15-19. I uploaded this data from the WHO and started by ensuring that there were no labels to interfere with the analysis within the data set and removed all of those. I ensured that each variable was coded as a numeric variable except for the countries variable which was coded as a character. As the primary and independent variables did not have data from certain countries, I eliminated those countries from the data set as they would add no value to the analysis. I did not eliminate countries with unavailable data for the secondary outcomes as this was less important and I did not want to skew the data of the main outcome. The percentage of countries with family planning is skewed to the right somewhat with a range of 2.1 to 89.6. There are no outliers. Neonatal mortality is slightly skewed to the left with no outliers and a range of 1 to 39. 5 year mortality is skewed to the left with 3 outliers and a range of 2 to 115. Adolescent birth of girls aged 15-19 is skewed to the left with 2 outliers and a range of 2.4 to 184.4. Adolescent birth of girls aged 10-14 is heavily skewed to the left with 6 outliers and a range from 0.0 to 10.7. For data exploration I used boxplots and summary statistics to view the data. To answer the main question I used correlation statistics, scatter plots with summary statistic lines, and also an interactive scatter plot.
Going through the EDA checklist
Checking dimensions and headers and footers
#dim(whodata)#head(whodata)#tail(whodata)#found out there is an extra row that is meant as a label for years that needs to be deleted so deleting this row and then rechecking footerrow_numbers <-which(whodata$countries =="WHO region")whodata <- whodata[-row_numbers, ]#tail(whodata)
Checking Variable Types
#checking the variable typesvariable_types <-sapply(whodata, class)#print(variable_types)is_numeric <-is.numeric(whodata$familyplanning)#print(is_numeric)#Finding out that all the variables are coded as characters rather than numberic so changing all relevant variables to numeric and then rechecking all variable typeswhodata$familyplanning <-as.numeric(whodata$familyplanning)
There is a moderate negative correlation between family planning and neonatal mortality. It has an R coefficient of -0.345. There is also a downward trend in the data when graphing it on a scatter plot by country. There is more of a change in effect of decreased neonatal mortality once over 50% of the population uses family planning methods. Before 50% there is a minimal effect of increasing modern family and the decrease in neonatal mortality. There is also a negative correlation between family planning and 5 year mortality. There is an R coefficient of -0.419. There is a steady downward trend between an increase in family planning and a decrease in 5 year mortality. There is very little negative correlation between family planning and adolescent birth rate of girls age 15-19. The R coefficient is -0.275. There is also a very small negative correlation between family planning and adolescent birth rate of girls age 10-14. The R coefficient is -0.284. For both adolescent birth rate of those ages 15-19 and 10-14 there does not seem to be a very strong downward trajectory of the smooth fit line in the scatter plot which makes it seem like there is not a large effect of family planning on adolescent mortality.
Publishable Graphs - Family Planning vs Neonatal Mortality
whodata %>%ggplot() +geom_point(mapping =aes(x = whodata$familyplanning, y = whodata$neonatalmortality)) +stat_smooth(mapping =aes(x = familyplanning, y = neonatalmortality)) +labs(title ="Number of Neonatal Mortality by Percentage of Modern Family Planning Use in each Country") +labs(x =expression("Percentage of Modern Family Planning"), y ="Number of Neonatal Mortality per 1000")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
interactiveplot <-plot_ly(data = whodata, x =~whodata$familyplanning, y =~whodata$neonatalmortality, text =~whodata$countries, type ="scatter", mode ="markers") interactiveplot <- interactiveplot %>%layout(title ="Neonatal Mortality by Percentage of Modern Family Planning with Country Names",xaxis =list(title ="Percentage of Modern Family Planning"),yaxis =list(title ="Number of Neonatal Mortality per 1000"))interactiveplot
Publishable Graphs - Family Planning vs 5 year Mortality
whodata %>%ggplot() +geom_point(mapping =aes(x = whodata$familyplanning, y = whodata$year5mortality)) +stat_smooth(mapping =aes(x = familyplanning, y = year5mortality)) +labs(title ="5 year Mortality by % of Modern Family Planning Use in Each Country") +labs(x =expression("Percentage of Modern Family Planning"), y ="Number of 5 year Mortality per 1000")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
mortality5 <-plot_ly(data = whodata, x =~whodata$familyplanning, y =~whodata$year5mortality, text =~whodata$countries, type ="scatter", mode ="markers") mortality5 <- mortality5 %>%layout(title ="5 Year Mortality by Percentage of Modern Family Planning with Country Names",xaxis =list(title ="Percentage of Modern Family Planning"),yaxis =list(title ="Number of 5 year Mortality per 1000"))mortality5
Publishable Graphs - Family Planning vs Adolescent (15-19) Births
whodata %>%ggplot() +geom_point(mapping =aes(x = whodata$familyplanning, y = whodata$adolescentbirths15_19)) +stat_smooth(mapping =aes(x = familyplanning, y = adolescentbirths15_19)) +labs(title ="Adolescent Birth (15-19) by % of Modern Family Planning Use in Each Country") +labs(x =expression("Percentage of Modern Family Planning"), y ="Adolescent Birth (age 15-19) per 1000")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
adolescent1519 <-plot_ly(data = whodata, x =~whodata$familyplanning, y =~whodata$adolescentbirths15_19, text =~whodata$countries, type ="scatter", mode ="markers") adolescent1519 <- adolescent1519 %>%layout(title ="Adolescent Birth 15-19 by Percentage of Modern Family Planning with Country Names",xaxis =list(title ="Percentage of Modern Family Planning"),yaxis =list(title ="Adolescent Births Age 15-19 per 1000"))adolescent1519
Warning: Ignoring 4 observations
Publishable Graphs - Family Planning vs Adolescent (10-14) Births
whodata %>%ggplot() +geom_point(mapping =aes(x = whodata$familyplanning, y = whodata$adolescentbirths10_14)) +stat_smooth(mapping =aes(x = familyplanning, y = adolescentbirths10_14)) +labs(title ="Adolescent Birth (10-14) by % of Modern Family Planning Use in Each Country") +labs(x =expression("Percentage of Modern Family Planning"), y ="Adolescent Birth (age 10-14) per 1000")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
adolescent1014 <-plot_ly(data = whodata, x =~whodata$familyplanning, y =~whodata$adolescentbirths10_14, text =~whodata$countries, type ="scatter", mode ="markers") adolescent1014 <- adolescent1014 %>%layout(title ="ADolescent Birth (10-14) by Percentage of Modern Family Planning with Country Names",xaxis =list(title ="Percentage of Modern Family Planning"),yaxis =list(title ="Adolescent Births Age 10-14 per 1000"))adolescent1014
world <-ne_countries(scale ="medium", returnclass ='sf')mapdata <-merge(world, whodata, by.x ="name_long", by.y ="countries", all.x =TRUE)mapdata %>%ggplot() +geom_sf(data = mapdata, aes(fill = familyplanning)) +scale_fill_gradient(low ="lightblue", high ="darkblue") +labs(title ="Family Planning by Country")
mapdata %>%ggplot() +geom_sf(data = mapdata, aes(fill = neonatalmortality)) +scale_fill_gradient(low ="lightgreen", high ="darkgreen") +labs(title ="Neonatal Mortality by Country")
Conclusion
There seems to be an association between increased family planning and decreased neonatal mortality. This may be the case because more people are able to choose when they want to have a baby and can be better prepared for taking care of that baby when it is born when there is more family planning. Another possible reason this could be true is because when there is increased family planning, women are more capable of planning when they want to have a baby and they can choose a time when they are better able to get prenatal care to ensure the health of the baby. There also seems to be an association between increased family planning and 5 year mortality. Most likely the same reasons apply to why this trend exists. One thing that was interesting was there did not seem to be a strong association between adolescent birth rate of girls age both 15-19 and 10-14 and family planning. This may be the case because family planning methods are probably not marketed to adolescents at the same level they are to adults. This can cause those who become sexually active earlier in life to potentially be more likely to not use family planning methods. It would be interesting to see the use of family planning methods based on age and see if most adolescents regardless of country do not use family planning methods which would explain the lack of association. One potential limitation of this study is that there might have been several countries that we eliminated that had no neonatal mortality data and family planning data. These countries could have strengthened our association because the countries which do not have the capabilities of gathering this data may more likely be from a lower income country and their access to family planning and health care is limited. Therefore, it might be likely that they had low rates of family planning and higher rates of neonatal mortality.